Search CORE

12 research outputs found

Tiresias: Predicting Security Events Through Deep Learning

Author: Blond Stevens Le
Brown Peter F.
Chua Zheng Leong
Gu Guofei
Ho Grant
Liu Yang
Melicher William
Provos Niels
Provos Niels
Richard Shin Eui Chul
Sabottke Carl
Soska Kyle
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

With the increased complexity of modern computer attacks, there is a need for defenders not only to detect malicious activity as it happens, but also to predict the specific steps that will be taken by an adversary when performing an attack. However this is still an open research problem, and previous research in predicting malicious events only looked at binary outcomes (e.g., whether an attack would happen or not), but not at the specific steps that an attacker would undertake. To fill this gap we present Tiresias, a system that leverages Recurrent Neural Networks (RNNs) to predict future events on a machine, based on previous observations. We test Tiresias on a dataset of 3.4 billion security events collected from a commercial intrusion prevention system, and show that our approach is effective in predicting the next event that will occur on a machine with a precision of up to 0.93. We also show that the models learned by Tiresias are reasonably stable over time, and provide a mechanism that can identify sudden drops in precision and trigger a retraining of the system. Finally, we show that the long-term memory typical of RNNs is key in performing event prediction, rendering simpler methods not up to the task

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

Tiresias: Predicting Security Events Through Deep Learning

Author: Blond Stevens Le
Brown Peter F.
Chua Zheng Leong
Gu Guofei
Ho Grant
Liu Yang
Melicher William
Provos Niels
Provos Niels
Richard Shin Eui Chul
Sabottke Carl
Soska Kyle
Publication venue: ACM Conference on Computer and Communications Security
Publication date: 01/01/2018
Field of study

With the increased complexity of modern computer attacks, there is a need for defenders not only to detect malicious activity as it happens, but also to predict the specific steps that will be taken by an adversary when performing an attack. However this is still an open research problem, and previous research in predicting malicious events only looked at binary outcomes (eg. whether an attack would happen or not), but not at the specific steps that an attacker would undertake. To fill this gap we present Tiresias xspace, a system that leverages Recurrent Neural Networks (RNNs) to predict future events on a machine, based on previous observations. We test Tiresias xspace on a dataset of 3.4 billion security events collected from a commercial intrusion prevention system, and show that our approach is effective in predicting the next event that will occur on a machine with a precision of up to 0.93. We also show that the models learned by Tiresias xspace are reasonably stable over time, and provide a mechanism that can identify sudden drops in precision and trigger a retraining of the system. Finally, we show that the long-term memory typical of RNNs is key in performing event prediction, rendering simpler methods not up to the task

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

UCL Discovery

Probabilistic Naming of Functions in Stripped Binaries

Author: Anh Quynh Coseinc Nguyen
Bao Tiffany
Bourquin Martial
Chul Richard Shin Eui
Dai Hanjun
DeFreez Daniel
Egele Manuel
Farhadi Mohammad Reza
Flake Halvar
Gulwani Sumit
Hu Yikun
Kim Soomin
Livshits Benjamin
Nagarajan Vijayanand
Ng Beng Heng
Pewny Jannik
Rosenblum E.
TensorFlow Martín Abadi
UC Santa Barbra Computer Security Lab and Arizona State University SEFCOM.
Publication venue: ACSAC '20: Annual Computer Security Applications Conference
Publication date: 07/12/2020
Field of study

Debugging symbols in binary executables carry the names of functions and global variables. When present, they greatly simplify the process of reverse engineering, but they are almost always removed (stripped) for deployment. We present the design and implementation of punstrip, a tool which combines a probabilistic fingerprint of binary code based on high-level features with a probabilistic graphical model to learn the relationship between function names and program structure. As there are many naming conventions and developer styles, functions from different applications do not necessarily have the exact same name, even if they implement the exact same functionality. We therefore evaluate punstrip across three levels of name matching: exact; an approach based on natural language processing of name components; and using Symbol2Vec, a new embedding of function names based on random walks of function call graphs. We show that our approach is able to recognize functions compiled across different compilers and optimization levels and then demonstrate that punstrip can predict semantically similar function names based on code structure. We evaluate our approach over open source C binaries from the Debian Linux distribution and compare against the state of the art

Crossref

UCL Discovery

Inference and Analysis of Formal Models of Botnet Command and Control Protocols

Author: Chia Yuan Cho
Dawn Song
Domagoj Babić
Eui Chul Richard Shin
Publication venue
Publication date: 01/01/2010
Field of study

We propose a novel approach to infer protocol state machines in the realistic high-latency network setting, and apply it to the analysis of botnet Command and Control (C&C) protocols. Our proposed techniques enable an order of magnitude reduction in the number of queries and time needed to learn a botnet C&C protocol compared to classic algorithms (from days to hours for inferring the MegaD C&C protocol). We also show that the computed protocol state machines enable formal analysis for botnet defense, including finding the weakest links in a protocol, uncovering protocol design flaws, inferring the existence of unobservable communication back-channels among botnet servers, and finding deviations of protocol implementations which can be used for fingerprinting. We validate our technique by inferring the protocol state-machine from Postfix’s SMTP implementation and comparing the inferred state-machine to the SMTP standard. Further, our experimental results offer new insights into MegaD’s C&C, showing our technique can be used as a powerful tool for defense against botnets

CiteSeerX

Crossref

On the feasibility of internet-scale author identification

Author: Arvind Narayanan
Dawn Song
Eui Chul
Hristo Paskov
John Bethencourt
Neil Zhenqiang Gong
Richard Shin
Publication venue
Publication date: 01/01/2012
Field of study

Abstract—We study techniques for identifying an anonymous author via linguistic stylometry, i.e., comparing the writing style against a corpus of texts of known authorship. We experimentally demonstrate the effectiveness of our techniques with as many as 100,000 candidate authors. Given the increasing availability of writing samples online, our result has serious implications for anonymity and free speech — an anonymous blogger or whistleblower may be unmasked unless they take steps to obfuscate their writing style. While there is a huge body of literature on authorship recognition based on writing style, almost none of it has studied corpora of more than a few hundred authors. The problem becomes qualitatively different at a large scale, as we show, and techniques from prior work fail to scale, both in terms of accuracy and performance. We study a variety of classifiers, both “lazy ” and “eager, ” and show how to handle the huge number of classes. We also develop novel techniques for confidence estimation of classifier outputs. Finally, we demonstrate stylometric authorship recognition on texts written in different contexts. In over 20 % of cases, our classifiers can correctly identify an anonymous author given a corpus of texts from 100,000 authors; in about 35 % of cases the correct author is one of the top 20 guesses. If we allow the classifier the option of not making a guess, via confidence estimation we are able to increase the precision of the top guess from 20 % to over 80% with only a halving of recall. I

CiteSeerX

Princeton University Open Access Repository

Crossref

Grey-box analysis and fuzzing of automotive electronic components via control-flow graph extraction

Author: Bao Tiffany
Chul Richard Shin Eui
den Herrewegen Jan Van
Fowler S
Garcia D.
Gustafson Eric
Hanna Steve
Ruge Jan
Stephens Nick
Verdult Roel
Xia Pei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/12/2020
Field of study

Crossref

University of Birmingham Research Portal